{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.1"
    },
    "pycharm": {
      "stem_cell": {
        "cell_type": "raw",
        "metadata": {
          "collapsed": false
        },
        "source": []
      }
    },
    "colab": {
      "name": "3.13_dropout.ipynb",
      "provenance": []
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "collapsed": true,
        "id": "EAePZMkOL2zd",
        "pycharm": {
          "is_executing": false
        }
      },
      "source": [
        "# 3.13 丢弃法\n",
        "除了前一节介绍的权重衰减以外，深度学习模型常常使用丢弃法（dropout）[1] 来应对过拟合问题。丢弃法有一些不同的变体。本节中提到的丢弃法特指倒置丢弃法（inverted dropout）\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "o2vTywd1L2zf"
      },
      "source": [
        "根据丢弃法的定义，我们可以很容易地实现它。下面的dropout函数将以drop_prob的概率丢弃NDArray输入X中的元素。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8FnjbKb_CQAa"
      },
      "source": [
        "## 3.13.2. 从零开始实现"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "hr6WElJSL2zg",
        "pycharm": {
          "is_executing": true
        }
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
        "import numpy as np\n",
        "from tensorflow import keras, nn， losses\n",
        "from tensorflow.keras.layers import Dropout, Flatten, Dense"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "76kX5DVN6Gsk"
      },
      "outputs": [],
      "source": [
        "def dropout(X, drop_prob):\n",
        "    assert 0 <= drop_prob <= 1\n",
        "    keep_prob = 1 - drop_prob\n",
        "    # 这种情况下把全部元素都丢弃\n",
        "    if keep_prob == 0:\n",
        "        return tf.zeros_like(X)\n",
        "    #初始mask为一个bool型数组，故需要强制类型转换\n",
        "    mask = tf.random.uniform(shape=X.shape, minval=0, maxval=1) < keep_prob\n",
        "    return tf.cast(mask, dtype=tf.float32) * tf.cast(X, dtype=tf.float32) / keep_prob"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "colab_type": "code",
        "id": "zumVlyqoL2zi",
        "outputId": "6b6357ea-c114-417c-d8a1-dc3c884f1ca2",
        "pycharm": {
          "is_executing": false
        }
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: id=65, shape=(2, 8), dtype=float32, numpy=\n",
              "array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],\n",
              "       [ 8.,  9., 10., 11., 12., 13., 14., 15.]], dtype=float32)>"
            ]
          },
          "execution_count": 12,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "X = tf.reshape(tf.range(0, 16), shape=(2, 8))\n",
        "dropout(X, 0)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "colab_type": "code",
        "id": "EHx1SBztL2zl",
        "outputId": "c7770ec5-3e32-42b2-8953-9ecde4aecc4e",
        "pycharm": {
          "is_executing": false
        }
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: id=79, shape=(2, 8), dtype=float32, numpy=\n",
              "array([[ 0.,  2.,  0.,  0.,  8.,  0.,  0.,  0.],\n",
              "       [ 0.,  0., 20., 22.,  0., 26.,  0., 30.]], dtype=float32)>"
            ]
          },
          "execution_count": 13,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dropout(X, 0.5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "colab_type": "code",
        "id": "FPs6YPXiL2zn",
        "outputId": "435340ab-3bb7-47a4-9bae-8949e63e6194",
        "pycharm": {
          "is_executing": false
        }
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: id=80, shape=(2, 8), dtype=int32, numpy=\n",
              "array([[0, 0, 0, 0, 0, 0, 0, 0],\n",
              "       [0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>"
            ]
          },
          "execution_count": 14,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dropout(X, 1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hpx4rjmZL2zp"
      },
      "source": [
        "### 3.13.2.1. 定义模型参数¶"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "lgno77Th8-UF"
      },
      "outputs": [],
      "source": [
        "num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256\n",
        "\n",
        "W1 = tf.Variable(tf.random.normal(stddev=0.01, shape=(num_inputs, num_hiddens1)))\n",
        "b1 = tf.Variable(tf.zeros(num_hiddens1))\n",
        "W2 = tf.Variable(tf.random.normal(stddev=0.1, shape=(num_hiddens1, num_hiddens2)))\n",
        "b2 = tf.Variable(tf.zeros(num_hiddens2))\n",
        "W3 = tf.Variable(tf.random.truncated_normal(stddev=0.01, shape=(num_hiddens2, num_outputs)))\n",
        "b3 = tf.Variable(tf.zeros(num_outputs))\n",
        "\n",
        "params = [W1, b1, W2, b2, W3, b3]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "collapsed": true,
        "id": "klSup1CwL2zs"
      },
      "source": [
        "### 3.13.2.2. 定义模型\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "H3-N6TtAL2zs"
      },
      "outputs": [],
      "source": [
        "drop_prob1, drop_prob2 = 0.2, 0.5\n",
        "\n",
        "def net(X, is_training=False):\n",
        "    X = tf.reshape(X, shape=(-1,num_inputs))\n",
        "    H1 = tf.nn.relu(tf.matmul(X, W1) + b1)\n",
        "    if is_training:# 只在训练模型时使用丢弃法\n",
        "      H1 = dropout(H1, drop_prob1)  # 在第一层全连接后添加丢弃层\n",
        "    H2 = nn.relu(tf.matmul(H1, W2) + b2)\n",
        "    if is_training:\n",
        "      H2 = dropout(H2, drop_prob2)  # 在第二层全连接后添加丢弃层\n",
        "    return tf.math.softmax(tf.matmul(H2, W3) + b3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "sKdzqexCCCce"
      },
      "source": [
        "### 3.13.2.3. 训练和测试模型"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 173
        },
        "colab_type": "code",
        "id": "Od5pa1veDmr3",
        "outputId": "0d8081da-5cba-41df-d454-419886568202"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz\n",
            "32768/29515 [=================================] - 0s 0us/step\n",
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz\n",
            "26427392/26421880 [==============================] - 1s 0us/step\n",
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz\n",
            "8192/5148 [===============================================] - 0s 0us/step\n",
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz\n",
            "4423680/4422102 [==============================] - 0s 0us/step\n"
          ]
        }
      ],
      "source": [
        "from tensorflow.keras.datasets import fashion_mnist\n",
        "\n",
        "batch_size=256\n",
        "(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()\n",
        "x_train = tf.cast(x_train, tf.float32) / 255 #在进行矩阵相乘时需要float型，故强制类型转换为float型\n",
        "x_test = tf.cast(x_test,tf.float32) / 255 #在进行矩阵相乘时需要float型，故强制类型转换为float型\n",
        "train_iter = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)\n",
        "test_iter = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "aDtX08s3D_n4"
      },
      "outputs": [],
      "source": [
        "# 描述,对于tensorflow2中，比较的双方必须类型都是int型，所以要将输出和标签都转为int型\n",
        "def evaluate_accuracy(data_iter, net):\n",
        "    acc_sum, n = 0.0, 0\n",
        "    for _, (X, y) in enumerate(data_iter):\n",
        "        y = tf.cast(y,dtype=tf.int64)\n",
        "        acc_sum += np.sum(tf.cast(tf.argmax(net(X), axis=1), dtype=tf.int64) == y)\n",
        "        n += y.shape[0]\n",
        "    return acc_sum / n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "DNmVaxxmCoPk"
      },
      "outputs": [],
      "source": [
        "def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,\n",
        "              params=None, lr=None, trainer=None):\n",
        "    global sample_grads\n",
        "    for epoch in range(num_epochs):\n",
        "        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0\n",
        "        for X, y in train_iter:\n",
        "            with tf.GradientTape() as tape:\n",
        "                y_hat = net(X, is_training=True)\n",
        "                l = tf.reduce_sum(loss(y_hat, tf.one_hot(y, depth=10, axis=-1, dtype=tf.float32)))\n",
        "            \n",
        "            grads = tape.gradient(l, params)\n",
        "            if trainer is None:\n",
        "                \n",
        "                sample_grads = grads\n",
        "                params[0].assign_sub(grads[0] * lr)\n",
        "                params[1].assign_sub(grads[1] * lr)\n",
        "            else:\n",
        "                trainer.apply_gradients(zip(grads, params))  # “softmax回归的简洁实现”一节将用到\n",
        "\n",
        "            y = tf.cast(y, dtype=tf.float32)\n",
        "            train_l_sum += l.numpy()\n",
        "            train_acc_sum += tf.reduce_sum(tf.cast(tf.argmax(y_hat, axis=1) == tf.cast(y, dtype=tf.int64), dtype=tf.int64)).numpy()\n",
        "            n += y.shape[0]\n",
        "        test_acc = evaluate_accuracy(test_iter, net)\n",
        "        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'\n",
        "              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 102
        },
        "colab_type": "code",
        "id": "6mDplouQL2zu",
        "outputId": "f66eb9e4-93c0-4c6b-c994-3cd65f2d1014",
        "pycharm": {
          "is_executing": false
        }
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "epoch 1, loss 0.0369, train acc 0.530, test acc 0.663\n",
            "epoch 2, loss 0.0261, train acc 0.658, test acc 0.704\n",
            "epoch 3, loss 0.0232, train acc 0.694, test acc 0.727\n",
            "epoch 4, loss 0.0215, train acc 0.714, test acc 0.741\n",
            "epoch 5, loss 0.0204, train acc 0.727, test acc 0.748\n"
          ]
        }
      ],
      "source": [
        "num_epochs, lr, batch_size = 5, 0.5, 256\n",
        "loss = tf.losses.CategoricalCrossentropy()\n",
        "train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,\n",
        "              params, lr)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "collapsed": true,
        "id": "aL4Q9QAZL2zw"
      },
      "source": [
        "## 3.13.3 简洁实现\n",
        "在Tensorflow2.0中，我们只需要在全连接层后添加Dropout层并指定丢弃概率。在训练模型时，Dropout层将以指定的丢弃概率随机丢弃上一层的输出元素；在测试模型时（即model.eval()后），Dropout层并不发挥作用。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 37,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 241
        },
        "colab_type": "code",
        "id": "2wqwvxNSL2zw",
        "outputId": "2cf5e1a7-483f-4e32-fb20-1a5d9c084cde"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Train on 60000 samples, validate on 10000 samples\n",
            "Epoch 1/5\n",
            "60000/60000 [==============================] - 2s 28us/sample - loss: 0.6704 - accuracy: 0.7647 - val_loss: 0.4495 - val_accuracy: 0.8352\n",
            "Epoch 2/5\n",
            "60000/60000 [==============================] - 1s 15us/sample - loss: 0.4306 - accuracy: 0.8447 - val_loss: 0.4112 - val_accuracy: 0.8525\n",
            "Epoch 3/5\n",
            "60000/60000 [==============================] - 1s 14us/sample - loss: 0.3902 - accuracy: 0.8590 - val_loss: 0.3898 - val_accuracy: 0.8588\n",
            "Epoch 4/5\n",
            "60000/60000 [==============================] - 1s 15us/sample - loss: 0.3618 - accuracy: 0.8687 - val_loss: 0.3590 - val_accuracy: 0.8713\n",
            "Epoch 5/5\n",
            "60000/60000 [==============================] - 1s 14us/sample - loss: 0.3423 - accuracy: 0.8756 - val_loss: 0.3617 - val_accuracy: 0.8718\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "<tensorflow.python.keras.callbacks.History at 0x7ff37d6afd30>"
            ]
          },
          "execution_count": 37,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "model = keras.Sequential([\n",
        "    keras.layers.Flatten(input_shape=(28, 28)),\n",
        "    keras.layers.Dense(256,activation='relu'),\n",
        "    Dropout(0.2),\n",
        "    keras.layers.Dense(256,activation='relu'),\n",
        "    Dropout(0.5),\n",
        "    keras.layers.Dense(10,activation=tf.nn.softmax)\n",
        "])\n",
        "model.compile(optimizer=tf.keras.optimizers.Adam(),\n",
        "              loss = 'sparse_categorical_crossentropy',\n",
        "              metrics=['accuracy'])\n",
        "model.fit(x_train,y_train,epochs=5,batch_size=256,validation_data=(x_test, y_test),\n",
        "                    validation_freq=1)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "collapsed": true,
        "id": "rhoznDzbL2zy"
      },
      "source": [
        "小结\n",
        "我们可以通过使用丢弃法应对过拟合。\n",
        "丢弃法只在训练模型时使用。\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "sFHN8HzQL2zz"
      },
      "outputs": [],
      "source": []
    }
  ]
}